spectral clustering
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with $\ell_p$-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.
Fully Bayesian Spectral Clustering and Benchmarking with Uncertainty Quantification for Small Area Estimation
In this work, inspired by machine learning techniques, we propose a new Bayesian model for Small Area Estimation (SAE), the Fay-Herriot model with Spectral Clustering (FH-SC). Unlike traditional approaches, clustering in FH-SC is based on spectral clustering algorithms that utilize external covariates, rather than geographical or administrative criteria. A major advantage of the FH-SC model is its flexibility in integrating existing SAE approaches, with or without clustering random effects. To enable benchmarking, we leverage the theoretical framework of posterior projections for constrained Bayesian inference and derive closed form expressions for the new Rao-Blackwell (RB) estimators of the posterior mean under the FH-SC model. Additionally, we introduce a novel measure of uncertainty for the benchmarked estimator, the Conditional Posterior Mean Square Error (CPMSE), which is generalizable to other Bayesian SAE estimators. We conduct model-based and data-based simulation studies to evaluate the frequentist properties of the CPMSE. The proposed methodology is motivated by a real case study involving the estimation of the proportion of households with internet access in the municipalities of Colombia. Finally, we also illustrate the advantages of FH-SC over existing Bayesian and frequentist approaches through our case study.
- Africa > Sub-Saharan Africa (0.14)
- South America > Colombia > La Guajira Department > Riohacha (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
- Research Report (0.64)
- Workflow (0.46)
- Health & Medicine (0.67)
- Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
An Improved and Generalised Analysis for Spectral Clustering
We revisit the theoretical performances of Spectral Clustering, a classical algorithm for graph partitioning that relies on the eigenvectors of a matrix representation of the graph. Informally, we show that Spectral Clustering works well as long as the smallest eigenvalues appear in groups well separated from the rest of the matrix representation's spectrum. This arises, for example, whenever there exists a hierarchy of clusters at different scales, a regime not captured by previous analyses. Our results are very general and can be applied beyond the traditional graph Laplacian. In particular, we study Hermitian representations of digraphs and show Spectral Clustering can recover partitions where edges between clusters are oriented mostly in the same direction. This has applications in, for example, the analysis of trophic levels in ecological networks. We demonstrate that our results accurately predict the performances of Spectral Clustering on synthetic and real-world data sets.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Advanced spectral clustering for heterogeneous data in credit risk monitoring systems
Han, Lu, Li, Mengyan, Qiang, Jiping, Su, Zhi
Heterogeneous data, which encompass both numerical financial variables and textual records, present substantial challenges for credit monitoring. To address this issue, we propose Advanced Spectral Clustering (ASC), a method that integrates financial and textual similarities through an optimized weight parameter and selects eigenvectors using a novel eigenvalue-silhouette optimization approach. Evaluated on a dataset comprising 1,428 small and medium-sized enterprises (SMEs), ASC achieves a Silhouette score that is 18% higher than that of a single-type data baseline method. Furthermore, the resulting clusters offer actionable insights; for instance, 51% of low-risk firms are found to include the term 'social recruitment' in their textual records. The robustness of ASC is confirmed across multiple clustering algorithms, including k-means, k-medians, and k-medoids, with ΔIntra/Inter < 0.13 and ΔSilhouette Coefficient < 0.02. By bridging spectral clustering theory with heterogeneous data applications, ASC enables the identification of meaningful clusters, such as recruitment-focused SMEs exhibiting a 30% lower default risk, thereby supporting more targeted and effective credit interventions.
- Banking & Finance > Credit (1.00)
- Banking & Finance > Risk Management (0.70)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data
Baligodugula, Vishnu Vardhan, Amsaad, Fathi
--This paper presents a comprehensive comparative analysis of prominent clustering algorithms--K-means, DB-SCAN, and Spectral Clustering--on high-dimensional datasets. We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques (PCA, t-SNE, and UMAP) using diverse quantitative metrics. Experiments conducted on MNIST, Fashion-MNIST, and UCI HAR datasets reveal that preprocessing with UMAP consistently improves clustering quality across all algorithms, with Spectral Clustering demonstrating superior performance on complex manifold structures. Our findings show that algorithm selection should be guided by data characteristics, with K-means excelling in computational efficiency, DBSCAN in handling irregular clusters, and Spectral Clustering in capturing complex relationships. This research contributes a systematic approach for evaluating and selecting clustering techniques for high-dimensional data applications.
- Asia > Singapore (0.05)
- Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)
- Asia > Middle East > Jordan (0.04)
Multi-View Spectral Clustering for Graphs with Multiple View Structures
Tsitsikas, Yorgos, Papalexakis, Evangelos E.
Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with each other. In this work, we provide an additional stepping stone towards resolving such ambiguities by presenting a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework. In fact, the generality of the proposed framework is additionally capable of shedding light to the largely unexplored area of multi-view graphs where each view may have differently clustered nodes. In turn, we propose GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means as well. This results in a principled alternative to the few existing methods studying this special type of multi-view graphs. Then, we conduct in-depth experiments, which demonstrate that GenClus is more computationally efficient than existing methods, while also attaining similar or better clustering performance. Lastly, a qualitative real-world case-study further demonstrates the ability of GenClus to produce meaningful clusterings.
- Asia (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- (3 more...)
Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with \ell_p -norm to construct an intrinsic affinity matrix with an adaptive extension.
Consistent Spectral Clustering in Hyperbolic Spaces
Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.
- North America > United States > Wisconsin (0.25)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (7 more...)